A Data-Driven Approach Using the NFL Big Data Bowl Dataset and Advanced Machine Learning Techniques
Rows: 393,536
Technique: Group Splitting
Factors to Consider:
- Tackle (0/1)
- Future X/Y
- S/A/O/Dir of defender
- Position / Alignment cluster Interaction
- Number of Defenders in the Box
- Current and future (.5 seconds) location of the ball
- O/S/A/Dir of ball carrier
- Velocity/direction difference
- Ball in defensive players ‘fan’
Concerns:
- Computational time
- Limited tuning parameters
- Limited data for train/test/validation
The best parameters are: Lambda = 0.01269 and Alpha = 0.00001 with an accuracy of 90.91%
The best parameters are: Mtry = 7, Min_n = 6, and Trees = 278 with an accuracy of 92.87%.
The best parameters are: Trees = 219, Min_n = 9, Tree Depth = 1, Learn Rate = 1.2, Loss Reduction = 24, and Sample Size = 1 with an accuracy of 92.87%.
def build_model(input_shape):
model = Sequential([
Dense(64, activation='relu', input_shape=[input_shape], kernel_regularizer=l2(0.001)),
BatchNormalization(), # normalizes layer inputs to stabilize and accelerate neural training
Dropout(0.3), # randomly deactivates neurons to prevent overfitting
Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
Dropout(0.3),
Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
Dropout(0.3),
Dense(1, activation='sigmoid', kernel_regularizer=l2(0.001)) # Apply L2 regularization here
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model\(\sum_{i=1}^{N} (\mathbb{I}_{\text{tackle}_i} - P(\text{tackle}_i))\)
Where:
| Penalized Regression | |
| Accuracy: 90.91% | |
| Display Name | Tackles Over Expected |
|---|---|
| Talanoa Hufanga | 7.10 |
| Jonathan Owens | 5.76 |
| Marcus Epps | 4.66 |
| Grover Stewart | 3.89 |
| Nicholas Morrow | −5.06 |
| Divine Deablo | −5.55 |
| Christian Kirksey | −5.61 |
| Myles Hartsfield | −5.91 |
| Penalized Regression | |
| Accuracy: 92.27% | |
| Display Name | Tackles Over Expected |
|---|---|
| Talanoa Hufanga | 4.92 |
| Maxx Crosby | 3.89 |
| Jonathan Owens | 3.87 |
| Cameron Jordan | 3.79 |
| Xavier McKinney | −2.80 |
| Demario Davis | −2.91 |
| Damien Wilson | −3.15 |
| Cody Barton | −3.76 |
| Penalized Regression | |
| Accuracy: 92.05% | |
| Display Name | Tackles Over Expected |
|---|---|
| Talanoa Hufanga | 6.12 |
| Jonathan Owens | 4.58 |
| Maxx Crosby | 4.16 |
| Cameron Jordan | 4.03 |
| Damien Wilson | −3.66 |
| Christian Kirksey | −3.97 |
| Cody Barton | −5.08 |
| Demario Davis | −5.57 |
| Neural Net | |
| Accuracy: 92.92% | |
| Display Name | Tackles Over Expected |
|---|---|
| Jonathan Owens | 4.79 |
| Jihad Ward | 3.71 |
| C.J. Mosley | 3.58 |
| Grover Stewart | 3.43 |
| Bradley Roby | −1.85 |
| Roy Lopez | −1.90 |
| Tyrann Mathieu | −1.93 |
| Marcus Davenport | −2.19 |